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Abstract 

Similarity  measurements  between  327  objects  and  227  images  are  useful  for  the  tasks  of 
object  recognition  and  classification.  We  distinguish  between  two  types  of  similarity  metrics: 
metrics  computed  in  image-space  (image  metrics)  and  metrics  computed  in  transformation- 
space  (transformation  metrics).  Existing  methods  typically  use  image  metrics;  namely,  metrics 
that  measure  the  difference  in  the  image  between  the  observed  image  and  the  nearest  view 
of  the  object.  Example  for  such  a  measure  is  the  Euclidean  distance  between  feature  points' 
in  the  image  and  their  corresponding  points  in  the  nearest  view.  (Computing  this  measure  i^  • 
equivalent  to  solving  the  exterior  orientation  calibration  problem.)  In  this  paper  we  introduce  as 
diflferent  type  of  metrics:  transformation  metrics.  These  metrics  penalize  for  the  deformations 
applied  to  the  object  to  produce  the  observed  im^e. 

t 

We  present  a  transformation  metric  that  optimally  penalizes  for  “affine  deformations”  under 
weak-perspective.  A  closed-form  solution,  together  with  the  nearest  view  according  to  this 
metric,  are  derived.  The  metric  is  shown  to  be  equivalent  to  the  Euclidean  image  metric,  in 
the  sense  that  they  bound  each  other  from  both  above  and  below.  For  the  Euclidean  image 
metric  we  offer  a  sub-optimal  closed-form  solution  and  an  iterative  scheme  to  compute  the  exact 
solution. 
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We  present  a  transformation  metric  that  optimally  penalizes  for  “affine  deformations  under 
weak-perspective.  A  closed-form  solution,  together  with  the  nearest  view  according  to  this 
metric,  are  derived.  The  metric  is  shown  to  be  equivalent  to  the  Euclidean  image  metric,  in 
the  sense  that  they  bound  each  other  from  both  above  and  below.  For  the  Euclidean  image 
metric  we  offer  a  sub-optimal  closed-form  solution  and  an  iterative  scheme  to  compute  the  exact 
solution. 


1  Introduction 


Object  recognition  is  a  process  of  selecting  the  object  model  that  best  matches  the  observed 
image.  A  common  approach  to  recognition  uses  features  (such  as  points  or  edges)  to  rep¬ 
resent  objects.  An  object  is  recognized  in  this  approach  if  there  exists  a  viewpoint  from 
which  the  model  features  coincide  with  the  corresponding  image  features,  e.g.  [Roberts,  1965, 
Fischler  and  Bolles,  1981,  Lowe,  1985,  Huttenlocher  and  UUman,  1987,  Basri  and  Ullman,  1988, 
Thompson  and  Mundy,  1987,  Ullman  and  Basri,  1991).  Since  images  often  are  noisy  and  mod¬ 
els  occasionally  are  imperfect,  it  is  rarely  the  case  that  a  model  aligns  perfectly  with  the  image. 
Systems  therefore  look  for  a  model  that  ‘‘reasonably”  aligns  with  the  image.  Consequently, 
measures  that  assess  the  quality  of  a  match  become  necessary.  Such  measures  are  useful  also 
for  the  initial  classification  of  novel  objects,  by  associating  the  new  object  to  the  most  similar 
object  in  the  library. 

A  common  measure  for  comparing  3D  objects  to  2D  images  is  the  Euclidean  distance  be¬ 
tween  feature  points  in  the  actual  image  and  their  corresponding  points  in  the  nearest  view  of 
the  object.  The  assumption  underlying  this  measure  is  that  images  are  significantly  less  reliable 
than  models,  and  so  perturbations  should  be  measured  in  the  image  plane.  This  assumption 
often  suits  recognition  tasks.  Other  measures  may  better  suit  different  assumptions.  For  exam¬ 
ple,  when  classifying  objects,  there  is  an  inherent  uncertainty  in  the  structure  of  the  classified 
object.  One  may  therefore  attempt  to  minimize  the  amount  of  deformations  applied  to  the 
object  to  account  for  this  uncertainty.  Such  a  distance  is  measured  in  transformation  space 
rather  than  in  image  space.  A  definition  of  these  two  types  of  measures  is  given  in  Section  3. 

Measures  to  compare  3D  models  and  2D  images  generally  are  desired  to  have  metrical 
properties;  that  is,  they  should  monotonically  increase  with  the  difference  between  the  measured 
entities.  (A  more  exact  definition  is  given  in  Appendix  A.)  The  Euclidean  distance  between 
the  image  and  the  nearest  view  defines  a  metric.  (We  refer  to  this  measure  as  the  image 
metric.)  The  difficulty  with  employing  this  measure  is  that  a  closed-form  solution  to  the 
problem  has  not  yet  been  found,  and  therefore  currently  numerical  methods  must  be  employed 
to  compute  the  measure.  A  common  method  to  achieve  a  closed-form  metric  is  to  extend  the 
set  of  transformations  that  objects  are  allowed  to  undergo  from  the  rigid  to  the  affine  one.  The 
problem  with  this  measure  is  that  it  bounds  the  rigid  measure  from  below,  but  not  from  above. 
Other  methods  either  achieve  only  sub-optimal  distances,  or  they  do  not  define  a  metric.  The 
existing  approaches  are  reviewed  in  Section  2. 


This  paper  presents  a  closed-form  distance  metric  to  compare  3D  models  and  2D  images. 

The  metric  penalizes  for  the  non- rigidities  induced  by  the  optimal  affine  transformation  that 
aligns  the  model  to  the  image  under  weak-perspective  projection.  The  metric  is  shown  to  bound 
the  least-square  distance  between  the  model  and  the  image  both  from  above  and  below.  We 
foresee  three  ways  to  use  the  metric  developed  in  this  paper: 
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1.  Obtain  a  direct  assessment  of  the  similarity  between  3D  models  and  2D  images?^** 
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2.  Obtain  lower  and  upper  bounds  on  the  image  metric.  In  many  cases  such  bounds  may 
suffice  to  unequivocally  determine  the  identity  of  the  observed  object. 

3.  Provide  an  initial  guess  to  be  then  used  by  a  numerical  procedure  to  solve  the  image 
distance. 

The  rest  of  this  paper  is  organized  as  foUows:  In  Section  2  we  review  related  work.  In 
Section  3  we  define  the  concepts  used  in  this  paper.  In  Section  4  we  summarize  the  main 
results  of  this  paper.  These  results  are  discussed  in  detail  and  proved  in  section  5  for  the 
transformation  metric,  and  section  6  for  the  image  metric.  Sections  5  and  6  can  be  omitted 
in  first  reading.  Finally,  in  Section  7  we  compare  the  distances  between  3D  objects  and  2D 
images,  obtained  by  alignment,  to  our  results. 

2  Previous  approaches 

Previous  approaches  to  the  problem  of  model  and  image  comparison  using  point  features  are 
divided  into  three  major  categories: 

1.  Least-square  minimization  in  image  space. 

2.  Sub-optimal  methods  using  correspondence  subsets. 

3.  Invariant  functions. 


The  traditional  photometric  approach  to  the  problem  of  model  and  image  comparison  in¬ 
volves  retrieving  a  view  of  the  object  that  minimizes  the  least-square  distance  to  the  image. 
This  problem  is  referred  to  as  the  exterior  orientation  calibration  problem  (or  the  recovery  of  the 
hand-eye  transform)  and  is  defined  as  follows.  Given  a  set  of  n  3D  points  (model  points)  and  a 
corresponding  set  of  n  2D  points  (image  points),  find  the  rigid  transformation  that  minimizes 
the  distance  in  the  image  plane  between  the  transformed  model  points  and  the  image  points. 
An  analytic  solution  to  this  problem  has  not  yet  been  found.  (Analytic  solutions  to  the  absolute 
orientation  problem,  the  least-square  distance  between  pairs  of  3D  objects,  have  been  found, 
see  [Horn,  1987,  Horn,  1991].  An  analytic  solution  to  the  least-square  distance  between  pairs 
of  2D  images  has  not  yet  been  found.)  Consequently,  numerical  methods  are  employed  (see 
reviews  in  [Tsai,  1987,  Yuan,  1989]).  Such  solutions  often  suffer  from  stability  problems,  they 
are  computationally  intensive  and  require  a  good  initial  guess. 

To  avoid  using  numerical  methods,  frequently  the  object  is  allowed  to  undergo  affine  trans¬ 
formations  instead  of  just  ripd  ones.  Affine  transformations  are  composed  of  general  linear 
'  transformations  (rather  than  rotations)  and  translations,  and  they  include  in  addition  to  the 
rigid  transformations  also  reflection,  stretch,  and  shear.  The  solution  in  the  affine  case  is  sim- 
'pler  than  that  of  the  rigid  case  because  the  quadratic  constraints  imposed  in  the  rigid  case  are 
not  taken  into  account,  enabling  the  construction  of  a  closed-form  solution.  At  least  six  points 
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are  required  to  find  an  affine  solution  under  perspective  projection  [Fischler  and  Bolles,  1981], 
and  four  are  required  under  orthographic  projection  [Ullman  and  Basri,  1991). 

The  affine  measure  bounds  the  rigid  measure  from  below.  The  rigid  measure,  however,  is 
not  bounded  from  above,  as  is  demonstrated  by  the  following  example.  Consider  the  case  of 
matching  four  model  points  to  four  image  points  under  weak-perspective.  Since  in  this  case 
there  always  exists  a  unique  affine  solution,  the  affine  distance  between  the  model  and  the  image 
is  zero.  On  the  other  hand,  since  three  points  uniquely  determine  the  rigid  transformation  that 
aligns  the  model  to  the  image,  by  perturbating  one  point  we  can  increase  the  rigid  distance 
unboundedly. 

A  second  approach  to  comparing  models  to  images  involves  the  selection  of  a  small  sub¬ 
set  of  correspondences  (alignment  key),  solving  for  the  transformation  using  this  subset,  and 
then  transforming  the  other  points  and  measuring  their  distance  from  the  corresponding  im¬ 
age  points.  Three  [Fischler  and  Bolles,  1981,  Rives  et  al.,  1981,  Haralick  et  al.,  1991]  or  four 
[Horaud  et  al.,  1989]  points  are  required  under  perspective  projection,  and  three  points  un¬ 
der  weak  perspective  [Ullman,  1989,  Huttenlocher  and  Ullman,  1987]  .  The  obtained  distance 
critically  depends  on  the  choice  of  alignment  key.  Different  choices  produce  different  distance 
measures  between  the  model  and  the  image.  The  results  almost  always  are  sub-optimal,  since 
it  is  generally  better  to  match  all  points  with  small  errors  than  to  exactly  match  a  subset  of 
points  and  project  all  the  errors  onto  the  others. 

A  third  approach  involves  the  application  of  invariant  functions.  Such  functions  return  a  con¬ 
stant  value  when  applied  to  any  image  of  a  particular  model.  Invariant  functions  were  success¬ 
fully  used  only  with  special  kinds  of  models,  such  as  planar  objects  (e.g.,  [Lamdan  et  al.,  1987, 
Forsyth  et  al.,  1991]).  More  general  objects  can  be  recognized  using  model-based  invariant 
functions  [Weinshall,  1993].  For  noise-free  data,  model-based  invariant  functions  return  zero  if 
the  image  is  an  exact  instance  of  the  object.  To  account  for  noise,  the  output  of  these  functions 
usually  is  required  to  be  below  some  fixed  threshold.  In  general,  very  little  research  has  been 
conducted  to  characterize  the  behavior  of  these  functions  when  the  model  and  the  im^e  do 
not  perfectly  align.  The  result  of  thresholding  therefore  becomes  arbitrary. 


3  Definitions  and  notation 

In  the  following  discussion,  we  assume  weak-perspective  projection.  Namely,  the  object  under¬ 
goes  a  3D  transformation  that  includes  rotation,  translation,  and  scale,  and  is  then  orthograph- 
ically  projected  onto  the  image.  Perspective  distortions  are  not  accounted  for  and  treated  as 
noise. 

In  order  to  define  a  similarity  measure  for  comparing  3D  objects  to  2D  images,  as  discussed 
in  section  1,  we  first  define  the  best-view  of  a  3D  object  given  a  2D  image: 

Definition  1:  [best-view]  Let  d  denote  a  difference  measure  between  two  2D  images  of  n 
features.  Given  a  2D  image  of  an  object  composed  of  n  features,  the  best- view  of  a  3D  object 


3 


(model)  composed  of  n  correspoadiag  features,  is  the  view  for  which  the  smallest  value  of  d  is 
obtained.  The  minimization  is  performed  over  all  the  possible  views  of  the  model;  the  views 
are  obtained  by  applying  a  transformation  T,  taken  from  the  set  of  permitted  transformations 
A,  and  followed  by  a  projection,  II. 

We  compute  d,  the  difference  between  two  2D  images  of  n  features  in  two  ways; 

image  metric:  we  measure  position  differences  in  the  image,  namely,  it  is  the  Euclidean  dis¬ 
tance  between  corresponding  points  in  the  two  images,  summed  over  all  points. 

transformation  metric:  the  images  are  considered  to  be  instances  of  a  single  3D  object. 
The  metric  measures  the  difference  between  the  two  transformations  that  align  the  object 
with  the  two  images.  This  difference  can  be  measured,  for  instance,  by  computing  the 
Euclidean  distance  between  the  matrices  that  represent  the  two  transformations  (when 
the  two  transformations  are  linear). 

As  mentioned  above,  the  measure  d  is  applied  to  the  given  image  and  to  the  views  of  the 
given  model.  These  views  are  generated  by  applying  a  transformation  from  a  set  A  of  permitted 
transformations.  The  view  that  minimizes  the  distance  d  to  the  image  is  considered  as  the  best 
view,  and  the  distance  between  the  best  view  and  the  actual  image  is  considered  as  the  distance 
between  the  object  and  the  image. 

We  consider  in  this  paper  two  families  of  transformations:  rigid  transformations  and  affine 
transformations,  and  we  discuss  the  following  metrics: 

A'lm!  a  metric  that  measures  the  image  distance  between  the  given  image  and  the  best  rigid 
view  of  the  object. 

Naj-  a  metric  that  measures  the  image  distance  between  the  given  image  and  the  best  afline 
view  of  the  object. 

Sir',  a  transformation  metric.  We  assume  that  the  image  is  an  affine  view  of  the  object.  (When 
it  is  not,  we  substitute  the  image  by  the  best  affine  view.)  We  look  for  the  rigid  view 
of  the  object  so  as  to  minimize  the  difference  between  the  two  transformations;  the 
affine  transformation  (between  the  object  and  the  image)  and  the  rigid  transformation 
(between  the  object  and  its  possible  rigid  views.)  In  other  words,  we  look  for  a  view  so 
as  to  minimize  the  amount  of  “affine  deformations”  applied  to  the  object. 

To  illustrate  the  difference  between  image  metrics  and  transformation  metrics.  Figure  1 
shows  an  example  of  three  2D  images,  whose  similarity  relations  reverse,  depending  on  which 
kind  of  metric  is  used.  Consider  the  planar  object  in  Figure  1(b)  as  a  reference  object,  and 
assume  A  contains  the  set  of  rigid  transformations  in  2D.  The  images  in  (a)  and  (c)  are  obtained 
by  stretching  the  object  horizontally  (by  9/7)  and  vertically  (by  3/2)  respectively.  (The  image 
in  (b)  is  obtained  by  applying  a  unit  matrix  to  the  object.) 
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Figure  1:  The  2D  imege  shown  in  (b)  is  closer  to  the  image  in  (a)  when  the  difference  is  computed  in 
transformation  space,  and  closer  to  the  image  in  (c)  when  the  difference  is  the  Euclidean  difference  between  the 
two  images. 

a  The  image  metric  between  the  images  in  (b)  and  (a)  is  4,  two  pixel  at  each  of  the  left 
corners  of  the  rectangle. 

The  image  metric  between  the  images  in  (b)  and  (c)  is  2,  one  pixel  at  each  of  the  upper 
corners  of  the  rectangle. 

Therefore,  according  to  the  image  metric.  Figure  (c)  is  closer  to  (b)  than  (a)  is. 

•  To  compute  the  transformation  metric  consider  the  planar  object  illustrated  in  (b).  We 
compute  the  difference  between  the  matrices  that  represent  the  affine  transformation  from 

(b)  to  both  (a)  and  (c)  and  the  matrix  that  represent  the  best  rigid  transformation  (in 
this  case  it  is  the  unit  matrix):  (a)  is  obtained  from  (b)  by  a  horizontal  stretch  of  9/7. 
The  transformation  metric  between  (a)  and  (b)  is  therefore  2/7  =  9/7  —  1. 

(c)  is  obtained  from  (b)  by  a  vertical  stretch  of  3/2.  The  transformation  metric  in  this 
case  is  1/2  =  3/2  -  1. 

Therefore,  according  to  the  transformation  metric,  Figure  (a)  is  closer  to  (b)  than  (c)  is. 

It  is  interesting  to  note  that  in  this  example  the  solution  obtained  by  minimizing  the  transfor¬ 
mation  metric  seems  to  better  correlate  with  human  perception  than  the  solution  obtained  by 
minimizing  the  image  metric. 

3.1  Derivation  of  Nim  <^nd  iV,/ 

We  now  define  the  rigid  and  the  affine  image  metrics  explicitly.  Under  weak-perspective  pro¬ 
jection,  the  position  in  the  image,  ^  =  (x,,y,),  of  a  model  point  pi  —  (Xi,yi,Zi)  following  a 
rigid  transformation  is  given  by 

gi  =  +  i) 
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where  iZ  is  a  scaled,  3x3  rotation  matrix,  T  is  a  translation  vector,  amd  11  represents  an 
orthographic  projection.  More  explicitly,  denote  by  ff  and  rf^  the  top  two  row  vectors  of  R, 
and  denote  t  =  (txityitg);  we  have  that 

+  tx 

Vi  =  +  (1) 

where 

ff "  =  0 

f[  ri  =  fj-  fj  (2) 

The  rigid  metric,  Nimi  minimizes  the  difference  between  the  two  sides  of  Eq.  (1)  subject  to  the 
constraints  (2). 

When  the  object  is  allowed  to  undergo  affine  transformations,  the  rotation  matrix  R  is 
replaced  by  a  general  3x3  linear  matrix  (denoted  by  A)  and  the  constraints  (2)  are  ignored. 
That  is 


g,  =  niApi  +  i) 

Denote  by  Sj  and  ^  the  top  two  row  vectors  of  A,  we  obtain 

X,  =  Sj  ‘pi + 

Vi  =  ^Pi  +  ty  (3) 


The  affine  metric,  N^f,  minimizes  the  difference  between  the  two  sides  of  Eq.  (3). 

To  define  the  rigid  and  the  affine  metrics,  we  first  note  that  the  translation  component  of 
both  the  best  rigid  and  affine  transformations  can  be  ignored  if  the  centroids  of  both  model 
and  image  points  are  moved  to  the  origin.  In  other  words,  we  begin  by  translating  the  model 
and  image  points  so  that 

n  n 

J^P.  =  '^9i  =  0  (4) 

i=l  t=l 

We  claim  that  now  t  =  0.  The  proof  is  given  in  Appendix  C. 

Denote 


a  matrix  of  model  point  coordinates,  and  denote 
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the  location  vectors  of  the  corresponding  image  points.  A  rigid  metric  that  reflects  the  desired 
minimization  is  given  by 

Aim  =  min  l|i  -  Ptx\\^  +  11  j?  -  i^’^all*  s.t.  ff  •  fj  =  0,  ff  •  fj  =  •  f2  (6) 

The  corresponding  affine  metric  is  given  by 

A,y=  min^  J|£-P3i||*  +  |lir-Pff2l|"  (7) 

ai.«26R* 

In  the  affine  case  the  solution  is  simple.  We  assume  that  the  rank  of  P  is  3  (the  case  for 
general,  not  coplanar,  3D  objects).  Denote  P+  =  the  pseudo-inverse  of  P;  we 

obtain  that 

51  =  P+f 

52  =  P+i7  (8) 

And  the  affine  distance  is  given  by 

A„/  =  11(7  -  PP+)fl|2  -f  11(7  -  PP+)jni'  (9) 

Since  the  solution  in  the  rigid  case  is  significantly  more  difficult  than  the  solution  in  the 
affine  case,  often  the  affine  solution  is  considered,  and  the  rigidity  constraints  are  used  only  for 
verification  (e.g.  [Ullman  and  Basri,  1991,  Weinshall,  1993,  DeMenthon  and  Davis,  1992]). 

The  constrjunts  (2)  (substituting  a,  for  f],,  and  using  Eq.  (8))  can  be  rewritten  as 

jT(p+)Tp+^  =  0 

^T(p+)rp+£  ^ 


Denote 


B  =  (P+)^P+  (10) 

we  obtain  that 

i^By  =  0 

i^Bz  =  fBy  (11) 

where  B  is  an  n  x  n  symmetric,  positive-semidefinite  matrix  of  rank  3.  (The  rank  would  be 
smaller  if  the  object  points  are  coplanar.) 

We  call  B  the  characteristic  matrix  of  the  object.  P  is  a  natural  extension  to  the  3x3 
model-based  invariant  matrix  defined  in  [Weinshall,  1993).  A  more  general  definition,  and  its 
efficient  computation  from  images,  is  discussed  in  Appendix  B. 
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3.2  Derivation  of  Ntr 

We  can  now  define  a  transformation  metric  as  follows.  Consider  the  affine  solution.  The  nearest 
“affine  view”  of  the  object  is  obtained  by  applying  the  model  matrix,  P,  to  a  pair  of  vectors,  Si 
and  Sj,  defined  in  Ek}.  (8).  In  general,  this  solution  is  not  rigid,  and  so  the  rigid  constraints  (2^ 
do  not  hold  for  these  vectors.  The  metric  described  here  is  based  on  the  following  rule.  We  are 
looking  for  another  pair  of  vectors,  fi  and  rj,  which  satisfy  the  rigi-:  constraints,  and  minimize 
the  Euclidean  distance  to  the  affine  vectors  Si,  and  S2.  Pfi  and  Pf^  define  the  best  rigid  view 
of  the  object  under  the  defined  metric.  The  metric,  Ntr,  is  defined  by 

Ntr-  min  ||ai- fill' +I|a2- fall*  s.t.  ff.f2=0,  ff  ■  fi  =  f J”  •  fa  (12) 

where  Si  and  Sa  constitutes  the  optimal  affine  solution,  therefore 

Ntr  =  ,  rain  ||P+f  -  fill’  +  WP-^-ff- fall*  s.t.  ff  •  fa  =  0,  ff  •  fj  =  ff  •  fa  (13) 

In  Section  5  we  present  a  closed-form  solution  for  this  metric,  and  in  Section  6  we  show  how 
this  metric  can  be  used  to  bound  the  image  metric  from  both  above  and  below. 


4  Summary  of  results 

In  the  rest  of  the  paper  we  prove  the  following  results: 


4.1  Transformation  space: 

The  transformation  metric  defined  in  Eq.  (13)  has  the  following  solution 

Ntr  =  ^  By  -  Bx  •  ^By  -  (^By)^^ 

where  B  is  defined  in  Eq.  (10),  and  i,y  in  Eq.  (5).  The  best  view  according  to  this  metric  is 
given  by 


r  =  pp*(0,x  +  02y) 

f  =  PP'^(7ii +  72y) 


where  /?i,/?a.7i,7a  are  defined  in  Appendix  D. 
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4.2  Image  space: 

Using  Ntr  we  can  bound  the  image  metric  from  both  above  and  below.  Denote 

Na/  =  ii(/  -  + 11(1  -  pp^m^ 

we  show  that 

+  Ai7V„  <  <  Naf  +  XzNtr 

where  <  ><2  <  A3  are  the  eigenvalues  of  P'^P.  A  sub-optimal  solution  to  Nim  is  given  by 

»,  . 

NaJ  +  - ; - Ntr 

til  + 

where  the  computation  of  is  described  in  Appendix  E.  A  tighter  upper  bound  is  deduced 
from  this  sub-optimal  solution 

A^.m  <  A'«/  +  H.M.{A2,A3}iV,,  <  JVa/  +  2A2Ar,, 

where  H.M.{A2,A3}  =  ;  ^j_  is  the  Harmonic  mean  of  A2,  A3.  The  sub-optimal  solution  is 
proposed  as  an  initial  guess  for  an  iterative  algorithm  to  compute  Nim^ 


5  Closed-form  solution  in  transformation  space 

We  now  present  a  metric  to  compare  between  3D  models  and  2D  images  under  weak  perspective 
projection.  The  metric  is  a  closed-form  solution  to  the  transformation  metric,  Ntr  defined  in 
Eq.  (13).  We  use  the  notation  developed  in  Section  3.  P  is  the  n  x  n  characteristic  matrix 
of  the  object,  i.j/’C  "R"  contain  the  i-  and  y-coordinates  of  the  image  features.  The  metric  is 

given  by  _ 

Ntr  =  ^  (^x^Bx  +  ^By-  2\J^Bx  •  ^By-  (x^By)^  (14) 

This  metric  penalizes  for  the  nonrigidities  of  the  optimal  affine  transformation.  Note  that 
Ntr  =  0  if  the  two  rigid  constraints  in  Eq.  (11)  are  satisfied.  Otherwise,  Ntr  >  0  represents  the 
optimal  penalty  for  a  deviation  from  satisfying  the  two  constraints. 

Derivation  of  the  results: 

In  the  rest  of  this  section  we  prove  that  the  expression  for  Ntr,  given  by  Eq.  (14),  is  indeed 
the  solution  to  the  transformation  metric  defined  in  Eq.  (13).  The  proof  proceeds  as  follows: 
Theorem  1  computes  the  minimal  solution  when  and  are  restricted  to  the  plane  spanned 
by  ui  and  02’,  Theorem  2  extends  this  result  to  three-space. 

Theorem  1:  When  ri  and  fi  are  limited  to  span{ai,a2},  Ntr  is  given  by  Eq.  (14). 
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Figure  2:  The  vector*  2|,  2],  f|,  and  fj  in  the  coordinate  ayctem  ■peciiied  in  Theorem  1.  2]  and  2}  repreaent 
the  aolution  for  the  afiine  caae.  n  and  are  conatrained  to  be  in  the  aame  plane  with  2i  and  dj,  to  be  orthogonal, 
and  to  share  the  aame  norm. 

Proof:  We  first  define  a  new  coordinate  system  in  which 

Si  =  toi(l,0) 

02  =  W2(cosS,sin^) 
fi  =  5(cosQ,  — sino) 
f2  =  5(sin  a,  cos  o) 

(see  Figure  2).  9  is  the  angle  between  Si  and  02,  U'l  and  u'2  are  the  lengths  of  S]  and  02 
respectively,  s  is  the  common  length  of  the  two  rotation  vectors,  f]  and  f2,  and  —a  is  the  angle 
between  a\  and  fi.  Without  loss  of  generality  it  is  assumed  below  that  0“  <  <  180®  and 

—90®  <  Q  <  90®.  Notice  that  w\,  auid  9  are  given  and  that  s  and  a  are  unknown. 

Denote  /  the  term  to  be  minimized,  that  is 

/(q,s)  =  Hq,  -  f,ll^  +  1102  -  f2ll^ 


then 


/(q,s)  =  (toi  —  scoso)*  +  s^sin^o  +  (asin  cr  —  u;2Costf)*  +  (scoso  —  t02sinS)* 
=  Wy  +wl  +  2s*  -  2s((u;i  +  i02sin^]cosQ  +  tn2 cos sino) 
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The  partial  derivatives  of  /  are  given  by 


fa  =  2s([ti»i  +  W2  sin  fi]  sina  —  wj  cos  B  cos  a) 
ft  =  4s  —  2((t«i  +  tt>2sind]co8  0  +  tojcostfsina) 

To  find  possible  minima  we  equate  these  derivatives  to  zero 

/a  =  0 

/.  =  0 

Solutions  with  s  =  0  are  not  optimal.  In  this  case  /(o;,0)  =  w\-\-  w^,  and  later  we  show  that 
solutions  with  s  >  0  always  imply  smaller  values  for  /. 

When  s  ^  0,  fa  —  0  implies 


tan  a"*'“  = 


ufjcosB 
Wi  +  t£>2  sin  B 


therefore 


/,  =  0  implies 


cos  Q  = 


_ 1 _ tui  +  u>2  sin  ® 

+  (tanQ"**")2  +  ti)^  4-  2wiW2  sin  B 


s”""  =  +  twjsin^Jcosa"*"*  +  u;2Costf  sinor"”") 


Notice  the  similarity  of  this  expression  to  the  expression  for  /.  At  the  minimum  point  /  can 
be  rewritten  as 

/•»'”  =  «;?  +  ml  -  2(5”””)*  (15) 

(From  which  it  is  apparent  that  any  solution  for  /  with  s  0  would  be  smaller  than  the  solution 
with  s  =  0.)  Substituting  for  a"**”  we  obtain 


^min  _  _([u;i  +  mjsinSjcoso””"  +  m2CosPsinQ"‘”*) 

=  i  cosa’"'”(it;i  +  ti;2  8infl  +  m2  cos  ^  tana”*'") 

mi  +  m2  sin  ^  ,  .  .  m2  cos*  B  , 

=  — -  (mi  +  m2  sing -I- - f - r-r) 

2y/^+wi  +  2wiW2sinB  wj  +  W2S,nB' 

=  ^^m|  +  m|  +  2mi m2  sin  B 


and  therefore 


/”*"  =  m|  +  ml  -  2(s""")*  =  tvf  +  W2  -  ^  +  m|  +  2mim2 sin bJ 


or, 

/min  _  +  ttff  -  2wiu;3  sin  0) 

* 

Recall  that  wi  and  W3  are  the  lengths  of  Si  and  Sj,  that  is 

tuj  =  2f  -oj  =  i^BS 

wl  =  ^-22  = 

and  0  is  the  angle  between  the  two  vectors,  nanaely 

wi  W2  ^  =  ^tnJ^n^(r'-''co?^  =  ^^Bx  •  ^Bff  — 

We  obtain  that 

r'’"  =  i  (^S^Bi+fB^- lyJi^Bx.^By^ 

□ 

In  Theorem  1  we  proved  that  if  rj  and  are  restricted  to  the  plane  spanned  by  2i  and  02, 
the  metric  Nu  is  given  by  Eq.  (14).  In  Theorem  2  below  we  prove  that  any  other  solution  for 
fi  and  f-x  results  in  a  larger  value  for  /,  and  therefore  the  minimum  for  /  is  obtained  inside  the 
plane,  implying  that  Utr  indeed  is  given  by  Eq.  (14). 

Theorem  2:  The  optimal  fi  and  rx  lie  in  the  plane  spanned  by  2]  and  Sj. 

Proof:  Assume,  by  way  of  contradiction,  that  fi,r2  ^  span{aj,22};  we  show  that  the 

corresponding  value  for  /  is  not  minimal. 

Consider  first  the  plane  spanned  by  fj  and  2],  and  assume,  by  way  of  contradiction,  that 
f]  ^  span{f2, 2j};  we  show  that  there  exists  a  vector  such  that 

\K\\  =  ll^-jll 

fi  1  fx 

and 

Ilr1-3,ll<llr,-a,ll 

contradicting  the  optimality  of  /. 

Assume  ||r2||  =  s,  and  denote  by  fi  a  vector  with  length  s  in  the  direction  (f2  x  oi)  x  f2. 
This  vector  lies  in  span{f3,ai}  and  satisfies 

(|r-ill  =  ||r2|| 

fi  ±  f2 


12 


(There  exist  two  such  vectors,  opposing  in  their  direction.  We  consider  the  one  nearest  to  d].) 
We  now  show  that 

Ikl  -  3,11  <  Ilf, -3,11 

Denote  the  angle  between  3,  and  by  a,  and  denote  the  angle  between  f[  and  fi  by  0.  Also, 
denote  Wi  —  ||Si||  and  3  =  ||fi||  =  ||f2||  =  |l>^||-  We  can  rotate  the  coordinate  system  so  as  to 
obtain 

f\  =  3(1, 0,0) 
fj  =  3(0, 1,0) 
oi  =  Wi(cosa,sina,0) 
fi  =  3(co6  0, 0,  sin  0) 


Now, 

II -  Cl  Ip  =  (3  -  u»i  cos  o)*  +  ti>i  sin^  a  =  Wj  +  3*  —  23u>,  cos  a 

||fi  —  Si  Ip  =  (3  cos  0  —  wj  cos  q)*  +  «Ji  sin^  a  +  3^  sin*  0  =  in*  +  3*  —  23in  cos  a  cos  0 

and  therefore,  when  a  0°  and  0^0*“  (when  0  =  0**,  f,  and  coincide.) 

||f',-3i||<||r-i-S,|| 

contradicting  the  minimality  property.  Therefore,  fi  6  span{f2,3i}.  Similarly,  it  can  be  shown 
that  f2  €  37»n{fi,02},  therefore  all  four  vectors  3,,  32,  f,,  and  f2  lie  in  a  single  plane. 

□ 


Corollary  3:  The  transformation  metric  is  given  by 

Ntr  =  ^  ^x^Bx  +  y^By-  2y/x^Bx  ■  jp'By  -  (z^By)^^ 
and  the  best  view  for  this  metric  is 


where 


r  =  pp^{0ix-\-02^ 

r  =  +  72y) 


ft  =  |(i  +  -7= — - 


07  ~  ll  =  - 


i’^By 


2yJ^Bx-irBy-{it^By)^ 
x^Bx 


72  =  o  I  ^  “/ — - - - — 

2'  yJ^Bx  fBy-i^B^^ 
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Proof:  The  expression  for  the  metric  immediately  follows  from  Theorem  1  and  2.  The 

expression  for  the  best  view  is  developed  in  the  Appendix  D. 

□ 

6  Solution  in  image  space 

In  order  to  compute  the  image  metric  as  defined  in  section  3,  we  need  to  solve  the  constraint 
minimization  problem  defined  in  Eq.  (€) 

Nim=  min  ||£-Pfi||’  +  ||y-Pr2||*  s.t.  ff  •  r2  =  0,  ff  •  fi  =  •  fj 

Section  6.1  shows  that  Ntr,  computed  in  the  previous  section,  can  be  used  to  bound  Nim 
from  both  above  and  below.  Section  6.2  describes  a  direct  method  to  compute  a  sub-optimal 
approximation  to  Nim  and  outlines  an  iterative  algorithm  to  improve  this  estimate  to  obtain 
the  optimal  Nim- 

6.1  Bounding  the  image  metric  with  the  transformation  metric 

In  this  section  we  show  that  using  the  transformation  metric  defined  in  Section  5  Ntr,  and  the 
affine  metric  Na/  (pven  in  Eq.  (9)),  we  can  bound  the  image  metric  Nim  front  both  above  and 
below.  We  prove  the  following  theorem: 

Theorem  4:  Let  0  <  Ai  <  Aj  <  A3  denote  the  three  eigenvalues  of  P^P,  then 

Na/  +  MN,r  <  Nim  <  Naf  +  X^Ntr  (16) 

Proof:  Denote  by  and  the  vectors  that  minimize  the  term  for  the  image  metric  given 

in  Eq.  (6),  namely 

and  denote  by  fi  and  f2  the  vectors  that  minimize  the  transformation  metric  given  in  Eq.  (13), 
namely 

lV.,  =  llP+x-f,lp  +  |lP+y-f2|p 

We  start  by  showing  the  upper  bound.  Since  and  fj  minimize  the  term  for  Nim,  we  can 
write 

N,m  =  P-Pr-7ii’  +  i|jr-Pr-;ii’ 

<  ||f-Pr,f -f||j7-Pfj||* 

We  now  break  each  term  in  this  sum  into  two  orthogonal  components  as  follows 

f-Pf,  =(x-PP+f)-|-(PP+f-Pf,) 
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for  which  it  holds  that 


(£  -  PP+ff  •  (PP*S  -  Pfi)  =  0 
The  orthogonality  readily  follows  from  the  identity 

(PP+fP  =  (P-^fP^P  =  P(P’*Pr*(P^P)  =  P 
Since  the  two  components  are  orthogonal  it  holds  that 

1|£-  Prill*  =  Ilf  -  PP+flP  +  llPP+£-  Pfill* 

and,  similarly, 

l|y  -  Prif  =  Ili7-  PP+iHI’  +  ||PP+if-  Pfbll* 

Therefore  (recall  that  £i  and  £2  minimize  Ntr  and  that  A3  is  the  largest  eigenvalue  of  P^P) 

<  Ilf'-Pnll^  +  llif-Pr-ill* 

=  ||£  -  PP+£||*  +  ||PP+£  -  Pf,||*  +  lliT-  PP+^I*  +  ||PP+if-  Prall* 

=  11(7  -  PP+)£11*  +  11(7  -  PP+)illl*  +  llP(P+£-  £011*  +  |1P(P-^  jl-  £2)11* 

=  +  llP(P+£  -  £011'  +  llPCP-^y  -  f2)|l' 

<  +  A3(1|(P+£-  £011'  +  ||(P+y-  r-2)||’) 

=  Naf  +  A3JV,, 


Next,  we  prove  the  lower  bound.  The  proof  is  similar  to  the  proof  in  the  upper  bound  case, 
but  this  time  we  start  by  breaking  up  the  terms  into  orthogonal  components.  Then  we  use  the 
facts  that  £1  and  £2  minimize  Ntr  and  that  A]  is  the  smallest  eigenvalue  of  P^P. 

N,„  =  |l£-PfTll*  +  liy_Pr;||* 

=  ||£-  PP+£||*  +  ||PP+£-  PfJll*  +  llif-  PP+j^l*  +  ||PP+y  -  Pf;||2 
=  11(7  -  PP+)£||*  +  11(7  -  PP+)yl|"  +  ||P(P+£-  ft)||*  +  ||P(P+y  -  fj)!!* 

=  N,j  +  llP(P+£-  r-)ll*  +  |lP(P+y  -  r^H* 

>  ^-/  +  A,(|l(P+f  -  rT)||*  +  ||(P+y-  rl)||*) 

>  Naf  +  AjiVir 

Consequently 

Naf  +  ^lNir  <  N,m  <  NaJ  +  ^sNfr 


□ 
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6.2  Direct  solution  for  the  image  metric 

In  this  section  we  develop  tighter  bounds  on  the  image  metric  by  direct  methods,  following 
the  same  steps  we  took  in  the  derivation  of  the  transformation  metric  in  Section  5.  Unlike  for 
the  transformation  m>  'nc,  we  cannot  obtain  a  closed-form  solution  for  the  image  metric,  but 
we  can  obtain  a  bett-  estimator  than  we  have  previously  described.  This  also  enables  us  to 
develop  an  iterative  method  to  compute  the  distance  exactly. 

In  section  6.2.1  we  describe  a  change  of  coordinate  system,  arriving  at  a  minimization 
problem  which  is  similar  to  the  one  we  had  to  solve  for  the  transformation  metric.  The  difference 
is  that  the  sought  vectors  are  constrained  to  lie  on  an  ellipsoid  rather  than  a  sphere,  and  the 
ellipsoid  is  defined  by  a  3  x  3  positive-definite  version  of  the  characteristic  matrix  B. 

In  section  6.2.2  we  restrict  the  solution  vectors,  u,  v,  to  lie  in  a  plane  with  the  data  vectors, 
z,3/  and  we  derive  the  optimal  solution  under  this  constraint.  The  solution,  however,  is  only 
sub-optimal,  since  in  contrast  to  the  transformation  metric,  the  optimal  solution  in  this  case 
does  not  have  to  lie  in  the  plane.  Using  this  solution  we  derive  a  tighter  upper  bound  on  the 
optimal  solution. 

In  section  6.2.3  we  describe  the  general  problem  that  needs  to  be  solved,  and  outline  an 
iterative  method.  We  propose  the  solution  obtained  in  the  plane  as  an  initial  guess  for  this 
method. 


6.2.1  Reducing  the  dimenaionality  of  the  problem 

In  Section  6.1  we  have  shown  that  the  image  metric  can  be  broken  into  two  orthogonal  terms, 
implying  that 

N,„  =  Naf  +  rl)f  -h  \\P{P-^y-  rl)||*  (17) 

This  property  is  useful  for  a  direct  computation  of  the  image  metric.  The  first  term,  Naf,  does 
not  depend  on  fi,f2.  To  compute  fV,m,  therefore,  only  the  second  term  needs  to  be  minimized 

min  \\PP^x-Prxf^\\PP-^V-Pf2f  s.t.  ff  •  fj  =  0,  »f  •  fi  =  fj"  •  fj  (18) 

Note  first  that  PP*!  and  PP'^y,  two  vectors  in  H”,  both  lie  in  a  single  linear  subspace  of 
dimension  3.  (This  follows  from  the  fact,  shown  in  [Ullman  and  Basri,  1991],  that  every  image 
of  a  3/7  object  can  be  written  as  a  linear  combination  of  three  independent  views.)  Moreover, 
the  three  columns  of  P  lie  in  the  same  subspace.  It  therefore  follows  that  the  vectors  S  =  Pfi 
and  V  =  Pf]  must  also  lie  in  this  subspace. 

Denote  X  =  PP^x  and  X  =  PP*p,  the  projection  of  £  and  y  to  the  column  space  of  P, 
and  denote  S  =  Pn  and  v  =  Pfj.  (Note  that  rj  =  P^S,  fj  =  P+6,  and  B  =  (P+)^P+,  the 
characteristic  matrix  of  the  object.)  We  rewrite  the  problem  as  follows 

min  ||.Y  -  u||^ -f- 1|1' -  nil*  s.t.  =  0,  xFBu^iFBv  (19) 

a.sei?" 
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Since  all  the  vectors,  X,^,  H,  and  v,  lie  in  a  3D  subspace  (the  column  space  of  P)  we  can 
perform  the  minimization  in  72^.  To  transform  the  system  into  72^,  we  rotate  the  vectors  and  the 
characteristic  matrix  so  as  to  get  nontrivial  (nonzero)  values  only  in  three  of  the  coordinates. 
Recall  that  distances  and  quadratic  forms  are  invariant  under  rotation.  The  rotation  matrix  il 
that  should  be  applied  to  all  terms  is  defined  by  the  eigenvectors  of  B.  Applying  this  matrix 
to  B  (in  the  form  BSl)  results  in  a  diagonal  matrix  with  the  three  positive  eigenvalues  of  B. 

6.2.2  Closed-form  solution  in  the  plane 

Theorem  5:  When  u  and  v  are  limited  to  span{Xti^},  the  solution  of  Eq.  (19)  is  given  by 

(i^BS  +fBg-  2y/^Bx  .  iP'By  -  j  (20) 

where  y/JIi  <  y/pj  are  the  principal  axes  of  the  ellipse,  defined  by  the  intersection  of  the  ellipsoid 
B  with  the  plane  span{j^,Y}. 

Note  the  similarity  between  this  solution  and  Ntr  in  Eq.  (14).  In  fact, 

(21) 

The  proof  closely  follows  the  proof  for  Ntr  presented  in  Section  5  (Theorem  2).  We  therefore 
skip  some  of  the  details. 

Proof:  We  first  define  a  new  coordinate  system  in  which 

X  =  Wi(y/p{co6rt,y/pis\nrji) 

"9  =  V}2{y/p\  COS  9,  y/pl sin  9) 
u  =  s(^//I7cosQ,y^sina) 

V  =  s(-'v/JiTsinQ,Y//l2Cosa) 

{ ^  0 

B  =  0  ^  B^\ 

\  Bi3  B23  B33 ) 

Without  loss  of  generality  it  is  assumed  below  that  —90®  <  V  <  90®,  V  <  9  <  rf  +  180®,  and 
-90®  <  a  <  90®.  Notice  that  t^i,  tt>j,  17  and  9  are  given  and  that  s  and  a  are  unknown. 

Notice  that  this  setting  of  coordinate  system  is  similar  to  the  one  used  in  Theorem  1  with 
the  exceptions  that  here  u  and  v  lie  on  an  ellipse  rather  than  on  a  circle,  and  that  in  general 
none  of  the  points  can  be  brought  to  lie  on  a  principal  axis. 

Denote  by  /  the  term  to  be  minimized,  that  is 

/(o,s)  =  ||.V -«)!*  + Ilf -v)|* 
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then 


/(a,a)  =  co8f7  -  «cosa)^ +  sinr;  -  «sina)^ +  ;xi(tf>2cos0  +  «8ina)^  + 

/i3(  103  sin  9  —  A  cos  ot)^ 

=  Wiifii  co8^  17  +  A<3  sin^  17)  +  wlifti  cos*  9  +  fi2  sin*  8)  +  «*(/ii  +  7x3)  - 

24(101711  co8T7C06a  4- 10171]  sin  17  sin  a  —  103711  cos^sino  +  103713  sin  9  cos  a) 


The  partial  derivatives  of  /  are  ^ven  by 

/o  =  24[(i0i71i  cost7  + t037i3  8in0)sina  —  (t0i7i3sini7— 103711  cos9)cosa)] 

ft  =  24(711  +  713)  —  2(101711  cos  17  cos  a  +  101713  sin  17  sin  a  -  103711  cos  9  sin  a  +  103713  sin  9  cos  a) 

To  find  possible  minima  we  equate  these  derivatives  to  zero 

/a  =  0 

/.  =  0 


Again,  solutions  with  4  =  0  can  be  ignored  since  they  do  not  correspond  to  the  global  minimum 
(for  a  similar  reason  as  in  the  proof  of  Theorem  1). 

When  4  0,  /o  =  0  implies 

tan  o’"'"  = 

101711  cos  17  +  103713  sin  0 

ft  =  ^  implies 

_  *017^1  COS >7 4- 103713 sing 
cosq"*'"(71i  4-  712) 


and,  similarly  to  Eq.  (15), 


=  i0j(7ii  cos*  77  4-  7^2  sin*  17)  4- 103(7*1  cos*  ^  4-  7*2  sin*  9)  -  (711  4-  7*2)(s”*'")*  (22) 


We  substitute  4"**"  and  coso”*”*,  using  the  identity  coso  =  -t-JL— ,  into  Elq.  (22).  After 

yl+ian*  o 

some  manipulations,  we  obtain 

=  /™'"  =  (w?  +  *0*  -  2101103  8in(^  -  17))  (23) 


Note  that 

(PP-^fBiPP-*-)  =  {P*fp'^{P+fP+PP*  =  (P*fiP+Pf(p-*-p)P+  =  (/>+)*'P+  =  B 

(24) 
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from  which  it  follows  that 


to?  =  = 

v4  =  ?^B9  =  fBif  (25) 

vjiwj  cos(8  —  ri)  =  B'?  =  BH 

We  substitute  the  identities  from  Ek].  (25)  into  Eq.  (23),  obtaining  the  expression  for  Nim 
in  Eq.  (20). 

O 

The  derivation  for  n\  and  /i2  is  given  in  Appendix  E. 

The  sub-optimal  solution  in  the  plane  can  be  used  to  improve  the  bounds  on  the  image 
metric,  which  were  previously  discussed  in  Theorem  4. 

Theorem  6:  Let  0  <  <  Aj  <  A3  be  the  three  eigenvalues  of  P,  then 

Naf  +  XiNtr  <  iV.«  <  iV./  +  H.M.{A2,A3)iV,,  (26) 

where  H.M.{A2,A3}  =  1^1  ,  the  Harmonic  Mean  of  A2,  A3. 

Proof:  The  eigenvalues  of  the  characteristic  matrix  B  are  and  (This  is  shown 

in  Appendix  £.)  Since  l//i]  and  l//i2  represent  the  eigenvalues  of  a  section  of  B  it  holds  that 
(see,  e.g.,  [Strang,  1976]  p.  270) 


Using  Eq.  21  we  obtain  that 

-  2uiU‘t  2  2 

=  7-^^.r  =  T-rr^tr  <  -r~X^.r  =  H.M.fA,,  Aa}^., 

A*1+/^2  -  +  ^  + 

And,  using  Ek].  17  we  obtain  the  upper  bound 

^.m  <  Naf  +  Km  <  N,f  +  H.M.fX,,  >^3}  Ntr 


a 


Corollary  7: 

iV./  +  XiNtr  <  N„n  <  NaJ  + 


(27) 
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Note  that,  since  H.M.{a,h}  <  2min{a,6}  for  every  a,b,  we  have  the  following  corollary. 
Corollary  8: 

NaJ  +  XiNtr  <  <  Nat  + 

We  cannot  yet  improve  the  lower  bound  in  theorem  4;  but  we  conjecture  that 
Conjecture  1:  Let  0  <  Ai  <  Aj  <  A3  be  the  three  eigenvalues  of  P^P,  then 

Nat  +  H.M.{Ai, Aj}W,r  <  Nim  <  Nat  +  H.M.{A2, AajWtr  (28) 

Motivation:  We  know  that  if  the  two  data  points  lie  on  the  ellipse  whose  principal 

axes  are  of  length  Ai,  A2  (the  smallest  cross-section  of  the  ellipsoid  B),  then 

iV.m  =  ^-/  +  H.M.{A,,A2}JV,, 

We  can  show  that  this  solution  is  a  local  minimum,  namely,  it  is  not  possible  to  improve  the 
solution  by  applying  small  perturbations  to  the  solution  vectors. 

O 


6.2.3  An  iterative  optimal  solution 

The  solution  we  obtained  in  Theorem  5  is  sub-optimal;  it  is  not  the  lowest  distance.  We  now 
give  the  cost  function,  a  function  of  four  variables,  which  should  be  minimized  to  obtain  the 
precise  value  of  the  image  metric. 

We  first  define  a  coordinate  system  such  that 

X  =  tt>i(\/A7cos®cosi/,  V^cos8sini/,  \/^sin^) 

Y  =  u;2(\/^cos(^cos  17,  \/A2  cos^sin  t;,  v/^sinC) 
u  =  s(’yA7co8acos)3,\/A2Cososin/9,  v/Aasina) 

V  =  s( \/XJ^(sin/3cos7  4- sin  acosj38in7),  \/A2(— COS/0COS7 -H  sin  osinyflsin  7), 

—  >/A3  cos  a  sin  7) 

0  0\ 

B  =  0  ^  0 

VO  0 

where  wi,  ivj,  Ai,  A2,  A3,  C,  »7»  and  1/  are  known,  and  s,  a,  0  and  7  are  free. 

Note  that  this  setting  of  coordinate  system  is  similar  to  the  one  used  in  Theorem  5,  but 
now  u  and  v  lie  on  an  ellipsoid  rather  than  on  an  ellipse. 
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In  this  notation  the  free  parameters  are  selected  so  as  to  satisfy  the  two  rigid  constraints, 
G^Bu  =  G^Bv  and  S^Bv  =  0.  To  compute  the  image  metric,  the  following  function  should  be 
minimized. 

f{s,a,0,^)  =  Ai(sco6acos^  —  u)j  cos® cosi/)*  +  A2(sco8asin /3  —  ti>i  cosfisini/)^ + 

A3(ssina  —  wi  sintf)^  + 

Ai(ssin;3co67  +  s sin o cos sin 7  -  tt>2 cos C cos i;)*  4-  (29) 

A2( — s  cos  /?  cos  7  +  s  sin  a  sin  ^  sin  7  —  tuj  cos  (  sin  rj)^  + 

A3( —3  cos  o  sin  7  —  ti>2  sin 

Nim  is  the  global  minimum  of  f(3,a,(3,'j).  Assuming  that  f(3,a,/3,‘y)  is  convex  in  the  area 
that  contains  both  the  global  minimum  Nim  and  the  sub-optimal  solution  (A^a/  +  ./V,m),  we  can 
employ  the  following  iterative  method  to  compute  Nim- 

1.  compute  Nim] 

2.  improve  the  solution  by  any  gradient-descent  method  until  a  local  minimum  is  obtained. 

If  the  convexity  assumption  is  correct,  this  method  returns  the  correct  image  metric,  otherwise 
it  may  return  a  sub-optimal  solution. 


7  Simulations 

To  test  the  presented  metric  we  have  compared  it  with  the  alignment  method.  As  was  mentioned 
in  Section  2  the  alignment  method  involves  the  selection  of  a  small  subset  of  correspondences 
(alignment  key),  solving  for  the  transformation  using  this  subset,  and  then  transforming  the 
rest  of  the  points  and  measuring  their  distance  from  the  corresponding  image  points.  The 
obtained  distance  critically  depends  on  the  choice  of  alignment  key.  Difierent  choices  produce 
different  distance  measures  between  the  model  and  the  image.  The  results  are  almost  always 
sub-optimal,  since  it  is  usually  better  to  match  all  points  with  small  errors  than  to  exactly 
match  a  subset  of  points  and  project  the  errors  entirely  onto  the  others. 

In  our  simulations,  models  composed  of  four  points  were  projected  to  the  image  using  weak 
perspective  projection.  Gaussian  noise  (with  standard  deviation  0.05  of  the  radius  of  the  3D 
object)  was  added  to  the  obtained  images.  Using  the  expression  for  Nu  given  in  (14),  we 
computed  the  upper  and  lower  bounds  on  the  image  metric  between  the  model  to  the  noisy 
images.  In  addition,  we  computed  the  corresponding  alignment  distances,  each  reflecting  the 
distance  between  one  model  point  and  its  predicted  projection  in  the  image  after  the  alignment 
of  the  remaining  three  image  points  to  the  model. 

The  figures  below  summarize  our  results.  Figure  3  shows  the  percentage  of  alignment 
distances  which  actually  lie  within  the  bounds  on  the  image  metric  computed  by  our  metric 
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Figure  3:  The  percent  of  alignment  dittancet  which  lie  within  the  bound*  on  the  image  metric  computed  from 
our  clo*ed-form  equation*.  The  abaci***  give*  the  condition  number  of  the  chnrncterintic  matrix,  B,  which 
determine*  how  far  apart  the  lower  and  upper  bound*  on  the  image  metric  are.  The  larger  the  condition  number 
i*,  the  further  apart  the  bound*  are.  Solid  graph;  alignment  diatance*  relative  to  the  wide  bound*  from  Eq.  (26). 
Daahed  line*:  alignment  diatance*  relative  to  the  tight  upper  bound  from  Eq.  (27). 


(given  in  Eq.  (26)).  It  can  be  seen  that  when  the  bounds  are  relatively  tight  (when  the  condition 
number  on  the  characteristic  matrix  B  is  relatively  low)  most  of  the  alignment  solutions 
exceed  the  upper  bound.  Only  when  the  condition  number  gets  larger  do  the  alignment  distances 
lie  within  the  bounds.  When  a  tighter  upper  bound  is  used  (Eq.  (27)),  a  smaller  portion  of  the 
alignment  distances  actually  lie  within  the  bounds. 

Figure  4  shows  the  maximal  and  minimal  alignment  distances  obtained  in  different  runs 
relative  to  the  upper  and  lower  bounds  on  the  image  metric,  given  in  Eq.  (26)  and  E)q.  (27).  It 
can  be  seen  that  in  many  cases  even  the  best  alignment  solution  (the  one  that  minimizes  the 
distance)  still  exceeds  the  upper  bound. 


8  Summary 

We  have  proposed  a  transformation  metric  to  measure  the  similarity  between  3D  models  and 
2D  images.  The  frans/ormafion  metric  measures  the  amount  of  afRne  deformation  applied  to 
the  object  to  produce  the  given  image.  A  simple,  closed-form  solution  for  this  metric  has  been 
presented.  This  solution  is  optimal  in  transformation  space,  and  it  is  used  to  bound  the  image 
metric  from  both  above  and  below. 
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Figure  4;  The  maximal  and  minimal  aiignmenl  Ji»lancei  are  plotted  for  a  number  of  models  and  objects, 
varying  along  the  abscissa.  The  distances  in  these  plots  were  normalised  so  as  to  obtain  constant  lower  and 
upper  bounds  (the  lower  bound  is  set  to  1;  the  upper  bound  is  set  to  be  the  average  ratio  of  the  upper  bound  to 
the  lower  bound  in  each  sequence  of  runs).  Small  (between  1 .5  and  2.5)  and  large  (between  4.5  and  5.5)  condition 
numbers  are  used,  and  the  results  are  compared  to  both  the  wide  (Eq.  (26))  and  the  tight  (Ek].  (27))  bounds,  (a) 
Small  condition  number,  wide  bounds,  (b)  Small  condition  number,  tight  bounds,  (c)  Large  condition  number, 
wide  bounds,  (d)  Large  condition  number,  tight  bounds. 
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The  transformation  metric  presented  in  this  paper  can  be  used  in  several  different  ways  in 
the  recognition  and  classification  tasks; 

1.  It  provides  a  direct  assessment  of  the  similarity  between  models  and  images.  Measuring 
the  amount  of  deformation  applied  to  the  objects  makes  it  suited  for  the  task  of  object 
classification  where  the  uncertainty  in  the  structure  of  the  observed  objects  is  inherent. 

2.  The  transformation  metric  can  be  used  to  bound  the  image  metric,  the  distance  between 
the  image  and  the  closest  view  of  the  object,  from  both  above  and  below.  As  shown 
by  our  simulations,  these  bounds  often  provide  better  estimates  than  those  provided  by 
using  alignment.  Consequently,  we  believe  that  in  many  cases  the  bounds  suffice  to 
unequivocally  determine  the  identity  of  the  observed  object. 

3.  The  transformation  metric  provides  a  sub-optimal  closed-form  estimate  for  the  image 
metric.  A  scheme  which  uses  this  measure  will  prefer  “symmetric”  objects,  objects  whose 
convex-hull  is  close  to  a  sphere,  over  other  objects  which  are  significantly  stretched  or 
contracted  along  one  spatial  dimension.  This  solution  can  also  be  used  as  an  initial  guess 
in  an  iterative  process  that  computes  the  optimal  value  of  the  image  metric  numerically. 


Appendices 

A  Metric  properties 

The  measures  described  in  this  paper  compare  entities  of  different  dimensionalities;  3D  objects 
and  2D  images.  We  define  a  metric  for  comparing  such  entities  as  follows.  Let  P  be  a  set  of  n 
model  points,  and  let  q  be  a  set  of  n  corresponding  image  points.  A  distance  function,  N(P,g), 
defined  using  a  difference  function  d(g,q')  between  two  views  (see  Section  3),  is  called  a  metric 
if 

1.  N(P,q)  >  0  for  every  model  P  and  image  q. 

2.  N{P,q)  —  0  if,  and  only  if,  q  is  a  rigid  view  of  P. 

3.  Vq',  N{P,q)  <  N{P,q')  -i-  d{q  -  q') 

For  the  image  metric.  Ni^,  d  is  simply  the  Euclidean  distance  between  corresponding  points  in 
the  compared  images.  It  is  straightforward  to  see  that  the  conditions  hold  for  this  case.  In  the 
rest  of  this  appendix  we  prove  that  these  conditions  also  hold  for  the  transformation  metric, 

Ntr. 
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Transformation  metric 


The  transformation  metric,  Nu,  measures  the  amount  of  “affine  deformation”  applied  to  the 
object  in  the  image.  The  metric  conditions  for  Ntr  are  defined  as  follows. 

1.  N(P,q)  >  0  for  every  model  P  and  image  q. 

2.  N(P,q)  =  0  if,  and  only  if,  there  exists  a  rigid  view  which  coincides  with  PP'^q.  (In 
other  words,  the  best  affine  view  of  the  object  is  a  rigid  view  and  there  is  no  “affine 
deformation”.) 

3.  V,',  N{P,q)  <  N{P,q>)  +  ||P+(9  -  ,')|| 

Theorem  0:  jV„  is  a  metric. 

Proof: 

1.  Ntr  >  0.  Ntr  minimizes  a  non-negative  distance  function.  It  is  therefore  always  non¬ 
negative. 

2.  Ntr  ~  0  if,  and  only  if,  the  best  affine  view  is  rigid.  Denote  2  and  jf  the  x  and  y  coordinates 
of  the  points  in  q,  according  to  Eq.  (14) 

Ntr=0 

<=►  {£^B2  +  f  Biff  =  A{x'^B2  •  By  -  {2^ Byf) 

<=►  4-  2(f^Pf  •  y^By)  -|-  Byf  =  4(x^Bx  -  fBg)  -  ^{2^ Bgf 

<=>  {2^B2  ~  g^Bgf  =  -A(2^Bff 

This  equation  holds  if,  and  only  if,  both  sides  are  zero  implying  that 

f^Px  =  g^Bg 
x^Bg  =  0 

The  best  affine  view  of  the  object  is  given  by  PP+x,  PP'^y.  Following  Eq.  (11),  the  best 
affine  view  also  satisfies  the  rigidity  constraints  above,  and  therefore  it  forms  a  rigid  view. 

3.  The  metric  Ntr  is  defined  in  Eq.  (13)  as: 

Ntr{P,q)  =  ^^min^,  -  niP  +  -  fzll’  s.t.  •  fj  =  0,  ff  •  fi  =  f J  •  fj 

Let  tiJi  and  tSj  be  the  optimal  vectors  for  q',  that  is 

Ntr{P.q')  =  llP+i'-  «5,|p  +  IIP+jT-  iSziP 
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And  we  obtain 


=  llP+f'  -  tSill*  +  lIP+iT  -  +  ||P+?-  P+f'll^  +  IIP"^?-  P*ff\? 

>  llP+f-tiJill^  +  IIP+y-tZbll" 

>  min  llP+f  -  fill^  +  ll-P'*’!?-  ^2!^  =  Ntr{P,q) 

n.r3€K® 


B  The  computation  of  the  characteristic  matrix 

In  £q  (10)  the  characteristic  matrix  B  was  defined  using  the  matrix  of  Euclidean  model  point 
coordinates  P.  We  now  give  a  more  general  (though  equivalent)  definition  of  B  using  a  matrix  of 
affine  model  point  coordinates  Q.  Namely,  the  point  coordinates  in  Q  are  given  in  a  coordinate 
system  whose  axes  are  not  necessarily  prthonormal.  This  definition  makes  it  possible  to  compute 
B  directly  from  three  or  more  images  with  a  completely  linear  algorithm,  which  requires  no 
more  than  pseudo-inverse. 

We  select  an  affine  coordinate  system  whose  independent  axes  are  defined  by  three  of  the 
object  points,  to  be  called  the  basis  points.  Let  P^a*  denote  the  submatrix  of  P  corresponding 
to  the  coordinates  of  the  basis  points,  and  let  Q  denote  the  afhne  coordinates  of  all  the  object 
points  in  this  basis.  It  immediately  follows  that: 

P  =  (?  •  Pfra. 

Let  Bknt  denote  the  characteristic  matrix  of  the  three  basis  points.  From  Eq  (10)  it  follows  that 

56a.  =  5^1  (30) 

Finally,  from  the  definition  of  pseudo-inverse  it  can  be  readily  verified  that 

P^  ={Q  Pl>a.)^  =  Pk-lQ*  (31) 

We  now  describe  B  in  terms  of  Q  and  Bbat-  Substituting  Eq  (31)  into  the  definition  of  B 
in  Eq  (10),  and  using  Eq  (30),  we  obtain 

P  =  (P-")^P+  =  «?-*-)^Pfc„..Q+ 

The  linear  and  incremental  computation  of  the  matrices  Q  and  Bbat  from  at  least  three 
images  of  the  object  points  is  described  in  [Weinshall  and  Tomasi,  1992]. 
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C  Eliminating  translation 


In  this  appendix  we  show  that  translation  can  be  ignored  if  we  set  the  centroids  of  both  model 
and  image  points  to  be  the  origin.  To  show  this,  we  prove  that  the  best  rigid  and  affine 
transformations  maps  the  model  centroid  to  the  image  centroid.  We  begin  by  showing  that, 
given  two  sets  of  n  2D  points  (images),  the  best  translation  that  relates  the  two  images  maps 
the  centroid  of  the  first  image  to  that  of  the  second. 

Lemma  10:  Let  pi,...,Pn  €  72^  and  9i,-.,9n  €  71^  be  two  sets  of  corresponding  points. 
Denote  by  p  =  L  f.  and  q  =  ~  5Zr=i  9«  centroids  of  Pi,  ...,Pn  ^ad  9i,  ...,9n  respectively. 
The  translation  t*  €  71^  that  minimizes  the  term 


is  given  by 


t=i 


f  =  q-p 


Proof:  Assume,  by  way  of  contradiction,  that  the  best  translation  is  given  by 

t'  =  f  +  ^ 

for  some  nonzero  S  €  Denote  the  new  term  by  £>' 

D'  = 

i=l 


=  E  lip-  +  -  9-11"  +  2  Dp.  +  <*  -  90  •  ^  +  E  ll^ll' 

»=l  »=1  «=1 

=  Z?*  +  2n(p  +  t*-9).^  +  nP|p 


Since  t*  =  9  -  p,  we  obtain  that 

and,  therefore, 

which  implies  that 

contradicting  the  initial  assumption. 
□ 


p+  t*  -  9  =  0 


!>'  =  £>*  +  nll^lp 


D'  <  D' 


Using  Lemma  10  we  prove  that  the  best  rigid  and  affine  transformations  map  the  model 
centroid  to  the  image  centroid. 
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Theorem  11;  Let  be  a  set  of  n  model  points,  and  let  9i,...,9n  €  be  the 

corresponding  n  image  points.  The  rigid  transformation  that  minimizes  the  term 

D'  =  ^min  J2\lsnRPi  +  t -gif 

{StRti} 


where  11  denotes  the  orthographic  projection,  satisfies 

q  =  s^llR'P  +  f 


Proof:  Denote  by  p,  =  according  to  Lemma  10 

r  =  q-p 


p  =  -  ^  p.  =  i  s*np*p.  =  s*np*p 

”  1=1  ”  »=1 

we  obtain  that 

q  =  p  +  r  =  3-np*p  +  f 

The  theorem  holds  also  if  we  consider  affine  transformations  rather  then  only  the  rigid  ones. 
The  rotation  matrix  R  is  replaced  in  this  case  by  a  general  linear  transformation  A. 

a 

Theorem  11  shows  that  the  best  rigid  and  affine  transformations  map  the  model  centroid  to 
the  image  centroid.  Consequently,  if  the  two  centroids  are  moved  to  the  origin,  the  translation 
component  vanishes.  This  follows  immediately  from  Theorem  11,  since 

9  =  s’nP'P  +  r 

then 

P  =  9  =  0 

implies 

r  =  0 


D  Best  View 


In  this  appendix  we  develop  an  expression  for  the  best  view  of  the  transformation  metric,  Ntr. 
The  derivations  here  follow  the  notations  used  in  the  proof  of  Theorem  1,  from  which  we  have 
that 


s 

scosa 

ssino 


+  inj  +  2tni  W2  sin  ff 
+  tnjsintf) 

~W2  cos  6 
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According  to  Theorem  2,  fiifj  €  3pan{ai,fir2}.  We  can  therefore  express  fi  and  ^2  by 

Ti  =  0i3i  -k-  02^2 

f2  =  7i3i+72S2 


where  0\,  02,  7i,  and  72  are  scalars.  Substituting  the  definitions  of  the  vectors  fi,  f2<  sind 
02  we  obtain 

jcosa  =  0iW\  +  02V>2C^m6 
—a  sin  a  =  02W2  sin  B 

and 


s  sin  o  =  7i  W]  +  72W2  cos  9 
a  cos  o  =  72 W2  sin  9 


Therefore 


01  = 
02  = 
7i  = 
7j  = 

Substituting  for  a  and  a  we  obtain 


a  sin  a  cos  9  +  a  cos  a  sin  9 
W]  sin  9 

a  sin  a 
tV2  sin  9 

a  sin  o  sin  -  j  cos  a  cos  9 
wi  sin  9 

a  cos  a 
W2  sin  9 


01 

02  =  1\ 
12 


And  substituting  for  wi,  W21  ^<1  ^ 


2^  wisinP^ 
cos  9 
2  sin  9 

2^  irjsin^^ 


0x 

02  =  ll 

12 


fBy 


By 


) 


i^Bx 


yj x^Bx  ■  ^By  -  {x^By)^ 
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Now,  to  obtain  the  beat  view  we  use  the  following  identities 


r  =  Pfj  f,  =  Si  =  P+f 

iT  =  Pfj  fj  =  7i3i+72a2  32  =  P+^ 

Therefore 

r  =  pp+(A?  +  /32jO 
r  =  pp+(7if+72in 

E  Computing  the  eigenvalues  of  an  ellipse 

In  this  appendix  we  compute  the  eigenvalues  of  the  ellipsoid  B  amd  the  eigenvalue  of  an  elliptic 
section  of  this  ellipsoid. 

We  first  show  that  the  eigenvalues  of  the  characteristic  matrix,  P,  are  ^d 

where  Ai,  A2,  and  A3  are  the  three  positive  eigenvalues  of  P^P.  This  is  derived  as  follows. 

Ba=  P(P‘^P)-\P^P)-^P'^ a  = 

Multiplying  both  sides  by  P^  we  obtain  that 

(pTpj-lpTj  ^  ^pTg 

Denote  b  =  P^S 

{P^P)-‘6=  jE 

which  implies  that 

{P^P)b  =  Xb 

Given  ^  =  PP'^S  and  ?  =  PP^H  in  P®,  and  a  positive  definite  3x3  matrix  B,  let  B' 
denote  the  ellipse  defined  by  the  intersection  of  the  ellipsoid  B  with  the  plane  span{X,>^}.  We 
need  to  find  the  eigenvalues  of  B',  ~  and 

Without  loss  of  generality  we  assume  that  X  and  r  lie  on  the  ellipsoid  defined  by  B  (namely, 
we  normalize  the  vectors  so  that  Bx  =  1  and  '9^ B"?  =  ^ By  =  1).  Let  6  denote 

the  angle  between  A'  and  Y .  We  define  two  orthonormal  vectors  f'  and  j^,  which  span  the 
plane  span{jf,y'},  as  follows: 

1^1 

\Y\Bm9 
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Every  vector  t>  €  «pan{^,i^}  can  be  written  as 

iJ  =  of'  +  0^ 


and  the  intersection  ellipse  B*  is  ^ven  by 

05'?=  1  (o  =  1 

for  A  the  3x2  matrix  whose  columns  are  1*  and  We  therefore  have  that 

B  -  A  BA  -  (f^Bf) 


Substituting  the  expressions  for  f'  and  ff,  we  get 


(f'fPf' 


1 

m 

1X1^  -  2\X\\Y\co60{j^^ B9)  +  irpcos^^ 
IJf^rpsin^d 
(;?’'5f)|x|-|y|cos0 
jXPiyisintf 


To  obtain  the  two  eigenvalues  of  B'  ^  and  we  solve  the  characteristic  equation  of  B\ 
whose  roots  are 

|A'P  +  \Y\^  -  2|A'||y| costf  .  K  ±  y^(|Ar|2  +  |y|*  -  2|A'||y|cos0  .k)2  -  4|A'|»|y|2sin2  0(l  -  k^) 

2|Ar|2|y|2sin^^ 

for  K  =  Jl'^B?  =  x'^BS,  |X|  =  IPP+fl,  |y|  =  IPP+Jl,  and  costf  = 
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